In Homework 2 we will continue our focus on Baltimore but you will need to identify, develop, and answer a question on your own, using data. You will be free to choose a question of your own interest, subject to the following constraints:
The question you are asking should be Baltimore-related, or specifically relevant to the residents of Baltimore.
The question should be answerable with data. In other words, there should be an underlying hypothesis that is falsifiable with observed data.
You will need to meet with the course faculty to discuss your proposed question of interest before going ahead and completing the analysis. The primary purpose of this meeting is to ensure that the question being addressed satisfies the requirements and that the scope of your question is appropriate for the timeframe available to do the analysis. Please use the signup sheet to setup a meeting.
State your question here. Provide any context or other information that may be useful for understanding the significance of your question.
The Baltimore Police Department (BPD) is the 8th largest municipal force in the United States, staffed by over 3,200 sworn and civilian personnel. Officers are assigned to one of nine geographical districts in the city: Central District, Southeastern District, Eastern District, Northeastern District, Northern District, Northwestern District, Western District, Southwestern District, and Southern District. Despite having a pretty large municipal force, BPD’s jurisdiction covers Maryland’s largest city with a population of 614,000. Baltimore is also known to be a dangerous city with high crime rates, so there is a great burden on BPD officers. In a press conference held in mid-July of this year (2019), Baltimore Police Sgt. Michael Mancuso stated: “The Baltimore Police Department is currently 500 Police Officers short of the number required for effectiveness, with 400 of those positions needed in the Patrol Division… As it stands now, there are not enough Officers to even respond to the number of calls to 911, not to mention the addition of micro-zones, community engagement, and proactive policing.” [1]
This got me wondering about the current status of 911 calls in Baltimore. The BPD started posting up-to-date 911 call data on the OpenBaltimore portal starting on July 13, 2015. Given that I have access to call data, I decided to perform some data analysis. More specifically, the primary goal of this analysis is to assess whether the distribution of 911 call types is different across the nine different police districts. The BPD may not be able to fill the 500 vacancies due to struggling recruiting efforts or financial limitations, but given the current team of officers, there may be more efficient ways to allocate officers to different districts and assign them to different tasks such that 911 calls are responded to faster and the cases linked to the calls are resolved sooner. This analysis will hopefully lead to some interesting insights that will provide the BPD with some guidance in such area.
# load packages
library(readr)
library(plyr)
library(tidyverse)
library(reshape)
library(ggmap)
library(maptools)
library(rgeos)
library(rgdal)
library(lubridate)
library(plotly)
library(scales)
library(kableExtra)
library(utils)# download dataset
download.file("https://www.googleapis.com/drive/v3/files/1J5yd_9OYBXLr--j7Y1keLHqO8_w8zbYG?alt=media&key=AIzaSyAVFaRNkalQTKJ6M670d7m_g_7ttCoVyeI", "911_Police_Calls_tidied.csv")
download.file("https://www.googleapis.com/drive/v3/files/1x5sPD_TuLUxTJ9xO_spZZFj6IOm49Wn7?alt=media&key=AIzaSyAVFaRNkalQTKJ6M670d7m_g_7ttCoVyeI", "911_Police_Calls_for_Service.csv")Data was obtained from Open Baltimore: https://data.baltimorecity.gov/Public-Safety/911-Police-Calls-for-Service/xviu-ezkt. This analysis uses data downloaded on November 8, 2019. Records of 911 calls from July 1, 2013 to date are available in the dataset. Since the month of November is not over yet, I chose to limit the analysis to calls up to October 31, 2019. This leaves me with around 6.76 million call records over 2,314 days (76 months/ 6 years, 4 months).
I checked to see that there are indeed nine unique police districts documented in the dataset. I’m interested in the categorization of the 911 calls so I examined the ‘Description’ column of the dataset, and there ends up being 15,155 unique categories of call types. A quick examination of the call descriptions showed that in many instances, calls that are supposed to be in the same category are described differently due to misspellings or slight differences in wording. For example, “destruct propty”, “destruct prop”, “*dest of prop“,”destruction prop" should all represent the same case, but are worded differently in the dataset. I manually tidied the call type descriptions by reading through the list of call types with frequencies higher than 100 and judging which call types can be combined. I then focused on only the top 20 most prevalent call types and skimmed through the full list of call types again to see if there were any additional call types that could be merged together with these 20 call types. This was a bit of a tedious process, with some call types such as “investigate” being recorded in over 200 variations. As a result, I would suggest that the BPD create a selection menu for the description of calls, or at least for the most common call types that appear on a regular basis, to facilitate better data collection in the future.
After cleaning the data, I chose to retain only 911 call records that were categorized as one of the top 20 most prevalent call types. This leaves me with around 4.95 million call records, which represents 73.3% of the original raw dataset. In other words, the 20 most prevalent call types account for over 70% of the 911 calls in Baltimore.
# read in data
data <- read_csv("911_Police_Calls_for_Service.csv")
data$CallDate <- as.Date(data$CallDateTime, format = "%m/%d/%Y")
# we do not have full data for June 2013 and November 2019, so limit analysis to months in between
data <- data[-which(paste0(year(data$CallDate),"-",month(data$CallDate))=="2013-6"),]
data <- data[-which(paste0(year(data$CallDate),"-",month(data$CallDate))=="2019-11"),]
df <- data
# clean data
df$Description <- tolower(df$Description)
description <- as.data.frame(sort(table(df$Description), decreasing=T))
no_voice <- c("911/no voice","911/no voice","911 no voice","911/no voice.","911 no voice.","no voice","911/no voice...","91/no voice","911 /no voice","911 no voice...","911 novoice","911/ no voice","911/neyeo voice","911/no voice....","no voice","no voice.","no voice..")
df$Description[df$Description %in% no_voice] <- "no voice"
disorderly <- c("disorderly","*disorderly",".disorderly","disorderl","disorderly/","*disordelry","diso0rderly","disordely","disorserly")
df$Description[df$Description %in% disorderly] <- "disorderly"
common_assault <- c("common assault",".common assault","*common assault","comm.on assault","common assualt","common assult","common assaulpt")
df$Description[df$Description %in% common_assault] <- "common assault"
auto_accident <- c("auto accident","auto acc/injury","auto acc/inj ped","auto acc/death","auto acc ped","auto acc pedes","*auto accident","auto accidenjt","*auto acc","auto acciden","auto accidenet","autoaccdeathped","ped struck","*ped struck")
df$Description[df$Description %in% auto_accident] <- "auto accident"
silent_alarm <- c("silent alarm","*silent alarm","silent alarm pan","silent alarmp","silent alarpm")
df$Description[df$Description %in% silent_alarm] <- "silent alarm"
other <- c("other","*other","other-see text","other/see text","other/ see text","see textother","othe","othersee text","see other","other see text","other-s ee text","other-seetex","othsee text","other- see text")
df$Description[df$Description %in% other] <- "other"
narcotics_outside <- c("narcoticsoutside","narcotics outsid")
df$Description[df$Description %in% narcotics_outside] <- "narcotics outside"
business_check <- c("business check","*busn chk","*check busn","*business chec","busn check","check business","chk busn","*business check")
df$Description[df$Description %in% business_check] <- "business check"
family_disturbance <- c("family disturb","*family dispute","family dispute","family disturban","family dist","*family dispte","*family dispue","family disturbp")
df$Description[df$Description %in% family_disturbance] <- "family disturbance"
repairs_service <- c("repairs/service","repair order","*repair order")
df$Description[df$Description %in% repairs_service] <- "repairs/service"
burglary <- c("burglary","burglar","burgulary","burglarpy","*burg report","*burglary","buglarary","burg","burg.","burglaryp","poss burglary","attempt burglary","*poss burglary","attmpt burglary","burglary/poss","burglary attempt","attemp burglary","attmptd burglary","burglary attem","att burglary","attempt burg","attempted burgla","burg alarm","burglarpy","burglary alarm","burglary attemp","burglary poss","burglary-poss","*attemp burglary","*attemp burlgary","*attempted burgl","*burgalar alarm","*commercial burg","attemped burg","attmpt burglry","attmptd bulgry","attmpted burglry","attp burglary","burglar alarm","burglary att","burglary-attempt","burglary/gun","poss/burglary","possble burglary","possibleburglary","poss. burglary","attmpt brglry","attempted brglry","attmpted brglary","attmpted brglry","possible brglry")
df$Description[df$Description %in% burglary] <- "burglary"
hit_and_run <- c("hit and run","hit and run ped","hit and run/ped","*hit and run","*hit & run","hit & run","*fd veh hit/run","*hit/run","*hit&run ped","*ped hit","*susp in hit/run","hit & runiat","hit and run fu","hit n run accid","hit run victim","hit/run","hit/run see txt","*poss. hit and r","poss hit and run")
df$Description[df$Description %in% hit_and_run] <- "hit and run"
larceny <- c("larceny","larceny f/auto","larcency","bike larceny","*larceny","attmpt larceny","attempt larcen","poss larcency","*bike larceny","*larceny from au","attmpt larcency","*larceny attemp","after larceny","atmmpt larcency","attemptlarceny","attm bike larcny","larc","larceny f auto","larceny from aut","larceny.","larceny/gun","larceny/poss","lareny","poss autolarceny","poss bike larcny","poss larceny aut")
df$Description[df$Description %in% larceny] <- "larceny"
investigate <- c("investigate","*investigate","investgate","invesitgate","investigte","investagate","investiagte","invesigate","investigate.","investigat","investigae","investigation","*invesitgate","*invesigate","*investgate","*investigte","*investgiate","investiate","investigat e","*investigae","*investiagte","investigatge","investgiate","*investigate.","*investigat","investogate","*investigation","*investagate","investigate.","investiagate","investigage","*investiate","investigated","investigtae","investigagte","investigaste","investigate3","*iinvestigate","*investigatge","investugate","investgte","investigatr","*investigate3","iinvestigate","iinvesitigate","investgigate","*investigagte","invesstigate","investiage","investifate","investigates","investigaye","investtigate","*investuigate","investigaate","investigaet","investigarte","investigatee","investigtate","*investigaet","*investigate .","investiaget","investigatye","investigsate","investigste","*investiagate","*investigast","*investigat e","*investugate","inves tigate","investi","investiogate","investuigate","*invesrtigate","*investigate 3","*investigate y","*investigater","`investigate","invesdtigate","investage","investifgate","investigate 3","investigater","investigatey","investigatte","investiigate","investisgate","investiugate","invests","other/investigat","**investigation","**police investi","*invesetigate","*investicate","*investifate","*investigate--","*investigate`","*investigated","*investigatee","*investigateq","*investigates","*investigatr","*investigtate","*investitgate","*investiugate","*investogate","*investtigate","*see text/invest","`invest","invesatigate","invesr","invesrigate","invesrtigate","invest igate","invest y","investagation","investate","investigat4e","investigate//","investigatre","investiggate","investingate","investsigat","investy","invesy","see investigate","invest","*invest","invewtigate","invevestigate","invetstige","invetsiagte","invetigate q","invetigat","invetigarte","invetiagte","invetagate","invesytigate","invesvtigate","inveswt","investr","investoigate","investn","nvestivgate","investitegae","investitate","investisate","investiote","investiogte","investinvest","investington","investing","investimgate","investiigtae","investihgate","investihate","investigyate","investigtte","investigt","investigigate","investighate","investigete","investigatt","investigating","investigatg=e","investigatet","nvestigatemor","investigatem","investigateit","investivgate","investigatemor","investigatei8uy","investigatei","investigatefffff","investigateerrrr","investigateeeeee","investigate5","investigate33","investigate/","investigate***","investigate...","
investigate,,,,,","investigate,","investigate---","investigatd","investigatage","investigat4","investigat3e","investigalef","investig.","investig","investicgate","investicate","investica","investiagtet","investi3gate","investi3.gate","investgation","investgaite","investg.","investg","investfate","investeigate","investaigate","invest trouble","*invest trouble","invest problem","invest.trouble","invest trb","*invest.trouble","*invest problem","invest. problem","invest.","*invest. problem","*invest.","*invest trb","invest unk trbl","*inves","*invest; trouble","*invest unk trbl","invest troub ukn","*inves trouble","invetsigate","invest ukn troub","investigate trou","*invest unknown","invetigate","inves trouble","invest trb unkn","*invest trb unkn","inves","invest unknown","invest troulbe","*invest. trouble","invest,trouble","investigate .","invest unk","investigate trb","*invetigate","invest unk troub","investigate prob","invest problem","invest trb","invest unkn trou","*invest;trouble","inves trb","*investigate trb","*inv","inv","inv unk trouble","inv unkn trbl","*inv unk trouble","*inv trouble","nvestigate","*invstigate","*ivestigate","invstigate","ivnestigate","ivestigate","inestigate","*inv unkn trbl")
df$Description[df$Description %in% investigate] <- "investigate"
suspicious_person <- c("suspicious pers","*suspicious pers","susp per","suspicious male","suspicious per","suspicious perso","suspicous person")
df$Description[df$Description %in% suspicious_person] <- "suspicious person"
transport <- c("transport","*transport")
df$Description[df$Description %in% transport] <- "transport"
wanted_on_warr <- c("wanted on warr","*wanted on warra","wanted person","**wanted on warr","wanted on warran","*wanted person","*wanted subject","*wanted subjrct","person wanted","wanted on a warr","wanted subject")
df$Description[df$Description %in% wanted_on_warr] <- "wanted on warrant"
lab_request <- c("lab request","crime lab","crime lab reques","crime lab req","*crime lab","*crime lab req.","lab","needs crime lab","req crime lab")
df$Description[df$Description %in% lab_request] <- "lab request"
loud_music <- c("loud music","loud music/noise","noise comp music","noise music")
df$Description[df$Description %in% loud_music] <- "loud music"
destruct_property <- c("destruct propty","destruct prop","*dest of prop","destruction prop")
df$Description[df$Description %in% destruct_property] <- "destruct property"
armed_person <- c("armed person","*armed person","*armed knife","*armed with gun","armed pers")
df$Description[df$Description %in% armed_person] <- "armed person"
foot_patrol <- c("foot patrol","foot patrol")
df$Description[df$Description %in% foot_patrol] <- "foot patrol"
narcotics <- c("narcotics outside","narcotics inside","narcotics","narcotics onview","**narcotics purc","*narcotics","foound narcotics","found narcotics")
df$Description[df$Description %in% narcotics] <- "narcotics"
df$Description[df$Description == "aggrav assault"] <- "aggravated assault"
# extract only records of top 20 call types
description <- as.data.frame(sort(table(df$Description), decreasing=T))
df <- df %>% filter(Description %in% description$Var1[1:20])
# save tidied data
write.csv(df,"911_Police_Calls_tidied.csv", row.names=FALSE)data <- read_csv("911_Police_Calls_for_Service.csv")
df <- read_csv("911_Police_Calls_tidied.csv")I start by exploring the dataset a bit more through visualizations. The first thing I did was plot the time trend of monthly 911 calls by police district. I noticed something totally unexpected in this plot. The number of calls per month rose in early 2019 and then dropped drastically in the following two or three months.
# visualize time trends of 911 calls
top_calls <- as.data.frame(sort(table(df$PoliceDistrict), decreasing=T))
df$CallMonth <- paste0(year(df$CallDate),"-",month(df$CallDate),"-1")
# counts by month
callsEachMonthDistrict <- as.data.frame(table(df$PoliceDistrict,df$CallMonth))
colnames(callsEachMonthDistrict) <- c("District","Month","Calls")
callsEachMonthDistrict$Month <- as.POSIXct(callsEachMonthDistrict$Month)
callsEachMonthDistrict$District <- factor(callsEachMonthDistrict$District, levels = c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern"))
ggplot(data=callsEachMonthDistrict, aes(Month, Calls, colour=District)) + geom_line() + ggtitle("Number of 911 Calls By Police District \n (July 2013 - October 2019)") + xlab("\nDate") + ylab("Number of 911 calls\n") + scale_x_datetime(labels = date_format("%Y-%m"), breaks = seq(as.POSIXct("2013-07-01"),as.POSIXct("2019-10-01"), "3 months")) + theme(axis.text.x = element_text(angle = 90), text = element_text(size=14)) + scale_y_continuous(expand = c(0, 0), limits = c(0, 20000)) + labs(color='Police district')I looked into the dataset a bit more to investigate this issue, and figured out that the months of May 2019 and June 2019 had incomplete records. There were only call records for the first week of May 2019 and the last ten days of June 2019. This may have been due to the Baltimore ransomware attack that occurred in May 2019, in which most of Baltimore’s government computer systems were largely compromised by ransomware. As a result, I made the decision to limit the analysis to dates prior to May 2019. This now leaves me with around 4.54 million call records over 2130 days (70 months/ 5 years, 10 months). This does lead to the limitation of the analysis not reflecting most recent trends of 911 calls.
# investigate reason for drastic drop in number of 911 calls
tp <- df[which(paste0(year(df$CallDate),"-",month(df$CallDate))=="2019-5"),]
unique(tp$CallDate)## [1] "2019-05-01" "2019-05-02" "2019-05-03" "2019-05-04" "2019-05-05"
## [6] "2019-05-06" "2019-05-07"
tp <- df[which(paste0(year(df$CallDate),"-",month(df$CallDate))=="2019-6"),]
unique(tp$CallDate)## [1] "2019-06-30" "2019-06-23" "2019-06-28" "2019-06-24" "2019-06-25"
## [6] "2019-06-27" "2019-06-26" "2019-06-22" "2019-06-29" "2019-06-21"
A new time series plot was generated after the filtering of the data. It looks like the number of police calls have been decreasing slightly from July 2013 up till early 2018, and then starting in early 2019 the number of police calls have increased by a moderate amount. Seasonality trends can also be observed, with more calls received in the spring and summer and fewer in the winter.
# limit analysis to dates before May 2019 due to missing data
df <- df %>% filter(CallDate < as.Date("2019-05-01"))
top_calls <- as.data.frame(sort(table(df$PoliceDistrict), decreasing=T))
df$CallMonth <- paste0(year(df$CallDate),"-",month(df$CallDate),"-1")
callsEachMonthDistrict <- as.data.frame(table(df$PoliceDistrict,df$CallMonth))
colnames(callsEachMonthDistrict) <- c("District","Month","Calls")
callsEachMonthDistrict$Month <- as.POSIXct(callsEachMonthDistrict$Month)
callsEachMonthDistrict$District <- factor(callsEachMonthDistrict$District, levels = c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern"))
p1 <- ggplot(data=callsEachMonthDistrict, aes(Month, Calls, group=District, text = paste('Month:',format(as.Date(callsEachMonthDistrict$Month), "%Y-%m"),'<br> Calls:',Calls, '<br> District:',District))) + geom_line(aes(color=District)) + ggtitle("Number of 911 Calls By Police District (July 2013 - April 2019)") + xlab("<br> Date") + ylab("Number of 911 calls <br>") + scale_x_datetime(labels = date_format("%Y-%m"), breaks = seq(as.POSIXct("2013-07-01"),as.POSIXct("2019-4-01"), "3 months")) + theme(axis.text.x = element_text(angle = 90), text = element_text(size=14)) + scale_y_continuous(expand = c(0, 0), limits = c(0, 20000)) + labs(color='Police district')
ggplotly(p1, height = 500, width=800, tooltip = c("text"))I now consider how call volumes differ by district. From the generated map below, we see some clustering of neighboring districts. The Northeastern police district receives the highest number of 911 calls, followed by the Central district and the Eastern district. The number of 911 calls received in the Southwestern, Southern, and Southeastern districts are similar, and then the Northwestern, Northern and Western districts receive the fewest calls.
# visualizing call volume on Baltimore map
pd <- readOGR(dsn = "Police_Districts.geojson")## OGR data source with driver: GeoJSON
## Source: "/Users/yifanzhang/Desktop/Academics/advdatasci-2019/homeworks/2019-712-hw2-yifzhang17/Police_Districts.geojson", layer: "Police_Districts"
## with 9 features
## It has 7 fields
data$CallDate <- as.Date(data$CallDateTime, format = "%m/%d/%Y")
data <- data %>% filter(CallDate < as.Date("2019-05-01"))
total_calls <- as.data.frame(sort(table(data$PoliceDistrict), decreasing=T))
key <- na.omit(match(pd@data$AREA_NAME, total_calls$Var1))
pd@data$total_call_counts <- as.numeric(total_calls$Freq[key])
key <- na.omit(match(pd@data$AREA_NAME, top_calls$Var1))
pd@data$top_call_counts <- as.numeric(top_calls$Freq[key])
pd@data$id <- pd@data$AREA_NAME
pd.points <- fortify(pd, region="id")
pd.df <- join(pd.points, pd@data, by="id")
centroids.df <- as.data.frame(coordinates(pd))
names(centroids.df) <- c("Longitude", "Latitude")
centroids.df$id <- pd$id
myggmap <- get_map(location="Baltimore", zoom=12)
ggmap(myggmap) +
xlab("Longitude") + ylab("Latitude") + ggtitle("Total Number of 911 Calls (Top 20 Types) by Police District") +
geom_path(data=pd.df, aes(x=long, y=lat, group=group), color="black") +
geom_polygon(data=pd.df, aes(x=long, y=lat, group=group, fill=top_call_counts), alpha=.75) +
scale_fill_gradientn("Number of 911 calls", colors=c('green', 'yellow', 'red')) +
geom_text(data=centroids.df, aes(label = id, x = Longitude, y = Latitude), size=4) + theme(text = element_text(size=14)) I’m interested in comparing the distributions of 911 call types between police districts. A Chi-Square Test for Homogeneity shows that the distribution of 911 call types does indeed differ across the nine police districts. To look more into the differences in call type distributions, I generated barplots for visualization. Each bar represents the percentage of total 911 calls a particular type of call accounts for. For each district, the percentage is calculated as the number of a particular call type in the district divided by the total number of calls in the district (also including calls not on the top 20 list). For example, the Central district received a total of 39,983 auto accident calls, and a total of 855,534 calls, so 39,983/855,534=4.7% of the 911 calls are auto accident calls. I stacked the barplots vertically to facilitate visual comparison. To judge whether a call type is more prevalent in a particular police district compared to other districts, one can look at the same-colored bar across the plots vertically. What stood out most to me in this plot is that the Eastern district has a much higher percentage of repairs/service calls compared to the other police districts.
# wrangle call type distribution data
type <- as.data.frame.matrix(table(df$PoliceDistrict,df$Description))
type_percent <- as.data.frame(100*as.matrix(type)/total_calls$Freq)
type_percent_long <- melt(as.matrix(type_percent))
type_percent_long$X1 <- factor(type_percent_long$X1,levels = top_calls$Var1)
tot_percent <- rowSums(type_percent)
tot_percent <- stack(tot_percent)
type_percent_long$X1 <- factor(type_percent_long$X1, levels = c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern"))
# test for homogeneity
chisq.test(type)##
## Pearson's Chi-squared test
##
## data: type
## X-squared = 924476, df = 152, p-value < 2.2e-16
# barplots to compare distributions of 911 call types between police districts
p2 <- ggplot(type_percent_long, aes(fill=X2, y=value, x=X2, text = paste('Call type:', X2, '<br> Percentage:', sprintf("%.1f%%",type_percent_long$value,"%",sep="")))) +
geom_bar(position="dodge", stat="identity") +
facet_wrap(~X1,nrow = 9) + theme_minimal() + xlab("") + ylab("") + labs(fill='Description') + theme(axis.text.x=element_blank()) + ggtitle("Distribution of 911 Call Types by Police District") + theme(text = element_text(size=14))
ggplotly(p2, height = 1000, width=900, tooltip = c("text")) %>%
layout(margin = list(l = 80, r = 20, b = 50, t = 80),
yaxis = list(title = paste0(c("Percentage",
rep(" ", 25),
rep("\n ", 3)),
collapse = "")))A drawback of the plot above is that some call types are fairly rare in all districts, so the bars appear very short and it is hard to make a comparison across districts. To adjust for this, I tried to use a different metric. I examined one call type at a time, and calculated what percentage of such calls come from each police district. For example, the Central district received a total of 39,983 auto accident calls, while the city of Baltimore received a total of 306,183, so 39,983/306,183=13.1% of the auto accident calls are from the Central district. The percentages for the nine districts add up to 100%. A heat map of this metric is shown below. The color scheme is such that red represents that a district receives a relatively high proportion of the particular call type and blue represents that the district receives a relatively low proportion of the particular call type. Again, the Eastern district stands out as having a high volume of repairs/service calls. We can also see that the Southwestern district receives a high volume of community engagement calls, and the Western district receives a high volume of hot spot check calls.
# heatmap to compare distributions of 911 call types between police districts
std_type <- t(100*t(type)/colSums(type))
std_type_long <- melt(as.matrix(std_type))
std_type_long$X1 <- factor(std_type_long$X1, levels = c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern"))
p3 <- ggplot(std_type_long, aes(X1, X2, fill= value,
text = paste('Police district:', X1, '<br> Call type:', X2, '<br> Percentage:', sprintf("%.1f%%",value,"%",sep="")))) + geom_tile() + scale_fill_gradient2(low="navy", mid="white", high="red", midpoint=100/9, name = "Percentage", limits = c(0,100)) + xlab("\n Police district") + ylab("Call type\n") +
ggtitle("Distribution of 911 Call Types by Police District") + theme(text = element_text(size=13)) +
scale_y_discrete(limits = rev(levels(std_type_long$X2)))
ggplotly(p3, height = 800, width=1100, tooltip = c("text")) %>%
layout(margin = list(l = 80, r = 80, b = 50, t = 80),
yaxis = list(title = paste0(c("Call type",
rep(" ", 25),
rep("\n ", 3)),
collapse = "")))The Eastern district receives the most repairs/service calls, and with most repairs/service calls being labeled as non-emergency calls, the Eastern district has a higher proportion of 911 calls that are labeled as non-emergency compared to the other police districts.
# compare distributions of 911 call priorities between police districts
priority <- as.data.frame(prop.table(table(df$PoliceDistrict,df$Priority),1))
priority$Var2 <- factor(priority$Var2,levels = rev(c("Out of Service","Non-Emergency","Low","Medium","High","Emergency")))
priority$Var1 <- factor(priority$Var1,levels = rev(c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern")))
ggplot(data=priority, aes(x=Var1, y=Freq, fill=Var2)) + geom_bar(stat="identity",position=position_dodge()) + scale_fill_manual(values=rev(c('black','steelblue','limegreen','gold','orange','red'))) + coord_flip() + xlab("Police district \n") + ylab("Proportion") + ggtitle("Distribution of 911 Call Priorities by Police District") + labs(fill='Priority') + guides(fill = guide_legend(reverse = TRUE)) + theme(text = element_text(size=13)) It is also possible to look at the time trends of the call types. I chose to focus only on the three calls types (repairs/service, community engagement and hot spot check) that I had identified as being considerably more prevalent in a particular district. With repairs/service calls, it seems that the Eastern district has consistently through time received many more calls of this type. With community engagement calls, the first ever call of this type was recorded in June 2018 in the Western district, but it wasn’t until January 2019 when calls of this type started to become more common. The Southwestern district received the most community engagement calls since then. The volume of hot spot check calls over time follows a similar pattern, with the first hot spot check call being recorded in April 2018, and since then the Western district has shown the most increase in hot spot check calls.
# visualize time trends of distribution of call types
callsEachMonthDistrictType <- as.data.frame(table(df$PoliceDistrict,df$CallMonth, df$Description))
colnames(callsEachMonthDistrictType) <- c("District","Month","Type","Calls")
callsEachMonthDistrictType$Month <- as.POSIXct(callsEachMonthDistrictType$Month)
callsEachMonthDistrictType$District <- factor(callsEachMonthDistrictType$District, levels = c("Central", "Southeastern", "Eastern", "Northeastern", "Northern", "Northwestern", "Western", "Southwestern", "Southern"))
ggplot(callsEachMonthDistrictType, aes(x=Month, y=Calls, group=District)) + geom_line(aes(color=District)) +
facet_wrap(~Type,scales = "free", ncol = 4) + xlab("Date") + ylab("Number of calls") + labs(color='Police district') +
theme(text = element_text(size=14), axis.text.x=element_blank(),axis.ticks.x=element_blank()) repairs_service <- callsEachMonthDistrictType %>% filter(Type=="repairs/service")
p4 <- ggplot(data=repairs_service, aes(Month, Calls, group=District, text = paste('Month:',format(as.Date(repairs_service$Month), "%Y-%m"),'<br> Calls:',Calls, '<br> District:',District))) + geom_line(aes(color=District)) + ggtitle("Number of Repairs/service Calls By Police District (July 2013 - April 2019)") + xlab("\nDate") + ylab("Number of calls\n") + scale_x_datetime(labels = date_format("%Y-%m"), breaks = seq(as.POSIXct("2013-07-01"),as.POSIXct("2019-04-01"), "3 months")) + theme(axis.text.x = element_text(angle = 90), text = element_text(size=11)) + scale_y_continuous(expand = c(0, 0)) + labs(color='Police district')
ggplotly(p4, height = 500, width=800, tooltip = c("text"))community_engage <- callsEachMonthDistrictType %>% filter(Type=="community engage")
p5 <- ggplot(data=community_engage, aes(Month, Calls, group=District, text = paste('Month:',format(as.Date(community_engage$Month), "%Y-%m"),'<br> Calls:',Calls, '<br> District:',District))) + geom_line(aes(color=District)) + ggtitle("Number of Community Engage Calls By Police District (July 2013 - April 2019)") + xlab("\nDate") + ylab("Number of calls\n") + scale_x_datetime(labels = date_format("%Y-%m"), breaks = seq(as.POSIXct("2013-07-01"),as.POSIXct("2019-04-01"), "3 months")) + theme(axis.text.x = element_text(angle = 90), text = element_text(size=11)) + scale_y_continuous(expand = c(0, 0)) + labs(color='Police district')
ggplotly(p5, height = 500, width=800, tooltip = c("text"))hot_spot_check <- callsEachMonthDistrictType %>% filter(Type=="hot spot check")
p6 <- ggplot(data=hot_spot_check, aes(Month, Calls, group=District, text = paste('Month:',format(as.Date(hot_spot_check$Month), "%Y-%m"),'<br> Calls:',Calls, '<br> District:',District))) + geom_line(aes(color=District)) + ggtitle("Number of Hot Spot Check Calls By Police District (July 2013 - April 2019)") + xlab("\nDate") + ylab("Number of calls\n") + scale_x_datetime(labels = date_format("%Y-%m"), breaks = seq(as.POSIXct("2013-07-01"),as.POSIXct("2019-04-01"), "3 months")) + theme(axis.text.x = element_text(angle = 90), text = element_text(size=11)) + scale_y_continuous(expand = c(0, 0)) + labs(color='Police district')
ggplotly(p6, height = 500, width=800, tooltip = c("text"))Rankings of most common call types in each police district are provided for reference. The percentages represent the proportion of total 911 calls a particular type of call accounts for in each district. For example, the Central district received a total of 39,983 auto accident calls, and a total of 855,534 calls, so 39,983/855,534=4.7% of the 911 calls are auto accident calls.
t(sort(type_percent[1,],decreasing = T)) %>% kable(digits = 2, col.names="Central (%)") %>% kable_styling()| Central (%) | |
|---|---|
| no voice | 16.29 |
| disorderly | 8.97 |
| traffic stop | 5.42 |
| auto accident | 4.67 |
| common assault | 3.62 |
| narcotics | 3.56 |
| larceny | 3.32 |
| other | 3.05 |
| investigate | 2.55 |
| silent alarm | 1.80 |
| hit and run | 1.29 |
| hot spot check | 1.17 |
| destruct property | 1.09 |
| family disturbance | 1.07 |
| burglary | 0.96 |
| business check | 0.80 |
| suspicious person | 0.79 |
| auto theft | 0.60 |
| repairs/service | 0.26 |
| community engage | 0.07 |
t(sort(type_percent[2,],decreasing = T)) %>% kable(digits = 2, col.names="Eastern (%)") %>% kable_styling()| Eastern (%) | |
|---|---|
| repairs/service | 15.70 |
| no voice | 14.99 |
| traffic stop | 7.12 |
| disorderly | 6.14 |
| narcotics | 4.24 |
| auto accident | 3.56 |
| common assault | 3.43 |
| investigate | 2.35 |
| other | 2.30 |
| family disturbance | 2.12 |
| larceny | 2.02 |
| silent alarm | 1.60 |
| burglary | 1.40 |
| hit and run | 1.24 |
| destruct property | 1.10 |
| business check | 0.90 |
| hot spot check | 0.73 |
| auto theft | 0.56 |
| suspicious person | 0.48 |
| community engage | 0.14 |
t(sort(type_percent[3,],decreasing = T)) %>% kable(digits = 2, col.names="Northeastern (%)") %>% kable_styling()| Northeastern (%) | |
|---|---|
| no voice | 21.06 |
| disorderly | 9.61 |
| traffic stop | 8.64 |
| auto accident | 6.38 |
| common assault | 5.43 |
| silent alarm | 5.08 |
| family disturbance | 4.27 |
| larceny | 4.01 |
| other | 3.72 |
| investigate | 3.54 |
| narcotics | 2.86 |
| burglary | 2.66 |
| hit and run | 2.55 |
| business check | 2.24 |
| destruct property | 2.09 |
| suspicious person | 2.06 |
| auto theft | 1.44 |
| hot spot check | 1.03 |
| repairs/service | 0.27 |
| community engage | 0.09 |
t(sort(type_percent[4,],decreasing = T)) %>% kable(digits = 2, col.names="Northern (%)") %>% kable_styling()| Northern (%) | |
|---|---|
| no voice | 10.18 |
| disorderly | 8.80 |
| auto accident | 5.86 |
| traffic stop | 5.33 |
| silent alarm | 4.33 |
| common assault | 3.66 |
| larceny | 3.43 |
| other | 2.88 |
| business check | 2.50 |
| narcotics | 2.49 |
| investigate | 2.14 |
| burglary | 1.99 |
| family disturbance | 1.96 |
| suspicious person | 1.80 |
| hit and run | 1.74 |
| destruct property | 1.36 |
| repairs/service | 1.32 |
| auto theft | 0.87 |
| hot spot check | 0.71 |
| community engage | 0.11 |
t(sort(type_percent[5,],decreasing = T)) %>% kable(digits = 2, col.names="Northwestern (%)") %>% kable_styling()| Northwestern (%) | |
|---|---|
| no voice | 14.71 |
| disorderly | 8.77 |
| narcotics | 5.79 |
| traffic stop | 5.72 |
| auto accident | 4.79 |
| common assault | 3.97 |
| silent alarm | 3.64 |
| larceny | 3.11 |
| investigate | 2.52 |
| family disturbance | 2.51 |
| other | 2.12 |
| burglary | 2.00 |
| hit and run | 1.76 |
| business check | 1.48 |
| destruct property | 1.33 |
| suspicious person | 1.23 |
| auto theft | 1.06 |
| hot spot check | 0.86 |
| repairs/service | 0.58 |
| community engage | 0.13 |
t(sort(type_percent[6,],decreasing = T)) %>% kable(digits = 2, col.names="Southeastern (%)") %>% kable_styling()| Southeastern (%) | |
|---|---|
| no voice | 12.30 |
| disorderly | 9.98 |
| traffic stop | 7.73 |
| auto accident | 5.56 |
| larceny | 4.53 |
| other | 4.46 |
| common assault | 4.35 |
| silent alarm | 3.98 |
| narcotics | 3.32 |
| investigate | 2.56 |
| business check | 2.26 |
| family disturbance | 2.16 |
| burglary | 2.14 |
| hit and run | 1.97 |
| suspicious person | 1.84 |
| destruct property | 1.75 |
| hot spot check | 1.40 |
| auto theft | 0.98 |
| repairs/service | 0.39 |
| community engage | 0.09 |
t(sort(type_percent[7,],decreasing = T)) %>% kable(digits = 2, col.names="Southern (%)") %>% kable_styling()| Southern (%) | |
|---|---|
| no voice | 14.20 |
| disorderly | 10.78 |
| traffic stop | 6.76 |
| narcotics | 6.14 |
| common assault | 5.61 |
| auto accident | 5.27 |
| larceny | 3.74 |
| silent alarm | 3.29 |
| other | 3.15 |
| investigate | 3.03 |
| burglary | 2.80 |
| family disturbance | 2.70 |
| destruct property | 1.91 |
| business check | 1.87 |
| hit and run | 1.84 |
| suspicious person | 1.61 |
| auto theft | 1.17 |
| hot spot check | 0.84 |
| repairs/service | 0.19 |
| community engage | 0.04 |
t(sort(type_percent[8,],decreasing = T)) %>% kable(digits = 2, col.names="Southwestern (%)") %>% kable_styling()| Southwestern (%) | |
|---|---|
| no voice | 17.67 |
| disorderly | 8.84 |
| traffic stop | 8.13 |
| narcotics | 5.80 |
| auto accident | 4.94 |
| common assault | 4.93 |
| family disturbance | 3.62 |
| investigate | 3.16 |
| other | 3.13 |
| larceny | 3.06 |
| silent alarm | 2.94 |
| burglary | 2.48 |
| hot spot check | 2.47 |
| hit and run | 2.11 |
| destruct property | 1.73 |
| business check | 1.57 |
| auto theft | 1.22 |
| suspicious person | 1.20 |
| repairs/service | 0.70 |
| community engage | 0.38 |
t(sort(type_percent[9,],decreasing = T)) %>% kable(digits = 2, col.names="Western (%)") %>% kable_styling()| Western (%) | |
|---|---|
| no voice | 14.75 |
| narcotics | 9.52 |
| traffic stop | 9.32 |
| disorderly | 7.47 |
| hot spot check | 4.84 |
| common assault | 4.43 |
| auto accident | 3.50 |
| family disturbance | 2.73 |
| other | 2.68 |
| investigate | 2.66 |
| business check | 2.20 |
| larceny | 2.12 |
| burglary | 1.91 |
| silent alarm | 1.71 |
| hit and run | 1.49 |
| destruct property | 1.30 |
| auto theft | 0.88 |
| suspicious person | 0.64 |
| repairs/service | 0.19 |
| community engage | 0.11 |
Summarize your findings here.
The main goal of this analysis was to investigate whether the distribution of 911 call types differs across the nine police districts in Baltimore. Baltimore 911 call records from July 1, 2013 to April 30, 2019 were used for the analysis. Only the top 20 most prevalent call types during this time period were considered. These calls accounted for 73.3% of all the 911 calls received, so most of the information is retained, but there could be variations in other less common call types that are not captured by this analysis. A Chi-Square Test for Homogeneity showed that the distribution of 911 call types does indeed differ across the nine police districts. Most noticeable differences across districts included the Eastern district receiving an extremely high volume of repairs/service calls, the Southwestern district receiving a moderately high volume of community engagement calls, and the Western district receiving a moderately high volume of hot spot check calls. Tables of the proportions of total 911 calls each particular type of call accounts for are provided in the appendix for each district. These numbers can be used as a reference for recruiting or assigning officers with certain expertise to districts in which there is greatest need. For example, the Southwestern and Western district may need a larger patrol unit to cover the higher level of proactive monitoring activity.
The current analysis focuses on the aggregate call data over the past six years, with a brief look into the time trends of the 911 calls. The time series plots showed that the Eastern district has consistently through time received many more repairs/service calls compared to other districts. Community engagement calls first emerged in June 2018 and it wasn’t until January 2019 when such calls were recorded in all districts. Hot spot check calls also emerged fairly recently starting in April 2018. Another type of call that emerged in 2018 is business check calls. The emergence of these three types of calls contribute to the moderate increase in 911 call volume since late 2018, and exemplifies the increased proactive efforts of the BPD. Previous to the emergence of these calls, the Northeastern district consistently received a higher volume of 911 calls, but since the emergence of these new call types, other districts have started to surpass the Northeastern district in monthly calls received. For example, the Southwestern district has received the most community engagement calls, and in March and April of 2019 had a higher call volume than the Northeastern district. Similarly, the Western district has received the most hot spot check calls, and has had a higher call volume than the Northeastern district since January 2019. More data beyond April 2019 will be needed to assess the trends of community engagement, hot spot check, and business check calls and judge how sustainable these proactive efforts are given the shortage of police officers. It will also be interesting in a future analysis to explore in greater depth and detail how the distribution of call types has changed over time, and whether the changes have been due to enacted policies.